An Asymptotic Model for the English Hapax/Vocabulary Ratio

نویسنده

  • Fan Fengxiang
چکیده

In the known literature, hapax legomena in an English text or a collection of texts roughly account for about 50% of the vocabulary. This sort of constancy is baffling. The 100-millionword British National Corpus was used to study this phenomenon. The result reveals that the hapax/vocabulary ratio follows a U-shaped pattern. Initially, as the size of text increases, the hapax/vocabulary ratio decreases; however, after the text size reaches about 3,000,000 words, the hapax/vocabulary ratio starts to increase steadily. A computer simulation shows that as the text size continues to increase, the hapax/vocabulary ratio would approach 1.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Some macro quantitative features of low-frequency word classes

This contribution examines the macro quantitative features of 15 lowfrequency word classes. The relationship between word frequency classes and the sizes of the frequency classes obeys Altmann’s power law, and the sizes of lowfrequency word classes increase along with the increase of text length. The relationship between text length and the sizes of low-frequency word classes also obeys Altmann...

متن کامل

Multiword Vocabulary in Japanese ESL Texts

In this paper we describe our analysis of vocabulary across three sets of Japanese ESL texts. We focus upon frequency analysis of individual words and multiword sequences (n-grams), giving cross comparisons of 2, 3 and 4-gram multiword sequences. In addition, we consider the degree of emphasis on multiword vocabulary that is evident in each textbook corpus. This is derived from analysis of the ...

متن کامل

Models of EFL Learners’ Vocabulary Development: Spreading Activation vs. Hierarchical Network Model

Semantic network approaches view organization or representation of internal lexicon in the form of either spreading or hierarchical system identified, respectively, as Spreading Activation Model (SAM) and Hi- erarchical Network Model (HNM). However, the validity of either model is amongst the intact issues in the literature which can be studied through basing the instruction compatible wi...

متن کامل

An Overview of Vocabulary Learning Strategies in English as a Foreign Language

Researchers in the area of EFL learning have tried to put the way(s) by which EFL learners learnEnglish vocabulary into some frames and present them as strategies. This paper reviewsdescriptive research on vocabulary learning strategies in English as a foreign language. Thereview focuses on common strategies that learners use in vocabulary learning such as dictionarystrategies, note-taking stra...

متن کامل

Comparative Study of the Academic Vocabulary Content of Electronic Engi-neering Corpora, GE Materials and M.S. Entrance Examinations

The importance of vocabulary learning has been underlined in the field of English for Academic Purposes (EAP) because non-English majors who require reading English texts in their fields of study have to expand their English vocabulary knowledge much more efficiently than ordinary ESL/EFL learners. Since academic vocabulary instruction in Iranian universities is realized through the use of Gene...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computational Linguistics

دوره 36  شماره 

صفحات  -

تاریخ انتشار 2010